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Abstract 

Translation termination is accomplished by proteins of the Class I release factor family (RF) that recognize stop codons and 
catalyze the ribosomal release of the newly synthesized peptide. Bacteria have two canonical RFs: RF1 recognizes UAA and UAC, 
RF2 recognizes UAA and UGA. Despite that these two release factor proteins are sufficient for de facto translation termination, 
the eukaryotic organellar RF protein family, which has evolved from bacterial release factors, has expanded considerably, com- 
prising multiple subfamilies, most of which have not been functionally characterized or formally classified. Here, we integrate 
multiple sources of information to analyze the remarkable differentiation of the RF family among organelles. We document the 
origin, phylogenetic distribution and sequence structure features of the mitochondrial and plastidial release factors: mtRFIa, 
mtRF1, mtRF2a, mtRF2b, mtRF2c, ICT1, C12orf65, pRF1, and pRF2, and review published relevant experimental data. 
The canonical release factors (mtRFIa, mtRF2a, pRF1, and pRF2) and ICT1 are derived from bacterial ancestors, whereas the 
others have resulted from gene duplications of another release factor. These new RF family members have all lost one or more 
specific motifs relevant for bona fide release factor function but are mostly targeted to the same organelle as their ancestor. 
We also characterize the subset of canonical release factor proteins that bear nonclassical PxT/SPF tripeptide motifs and provide a 
molecular-model-based rationale for their retained ability to recognize stop codons. Finally, we analyze the coevolution of 
canonical RFs with the organellar genetic code. Although the RF presence in an organelle and its stop codon usage tend to 
coevolve, we find three taxa that encode an RF2 without using UGA stop codons, and one reverse scenario, where mamiellales 
green algae use UGA stop codons in their mitochondria without having a mitochondrial type RF2. For the latter, we put forward 
a "stop-codon reinvention" hypothesis that involves the retargeting of the plastid release factor to the mitochondrion. 
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Introduction 

Mitochondria and plastids translate their own genetic mate- 
rial. Even though the number of protein coding genes in these 
organelles can be quite limited, ranging from three genes in 
the mitochondria of apicomplexa (Hikosaka et al. 2010) to 
273 in the chloroplasts of Pinus koraiensis (Noh et al. 2007), 
their translation involves many molecular players — rRNAs, 
tRNAs, aminoacykRNA synthetases, ribosomal protein sub- 
units and translation initiation, elongation, and termination 
factors — with at least 150 proteins having been implicated in 
translating human mitochondrial mRNAs (Rotig 2011). One 
protein group that is essential for translation is the Class I 
Release Factor family. These recognize the stop codon at the 
ribosomal A-site, upon which they hydrolyze the ester-bond 
that connects the nascent polypeptide to the last tRNA in the 
ribosomal P-site, thus releasing the newly synthesized protein 
(Petry et al. 2008). Although cytosolic translation involves a 
single peptide chain release factor — eRF1 — of archaeal origin 
(Moreira et al. 2002) that decodes all three stop codons 



(Frolova et al. 1994); organellar translation termination, just 
like bacterial translation termination, employs two codon- 
specifk release factors: RF1 recognizes UAA and UAG, and 
RF2 recognizes UAA and UGA (Scolnick et al. 1968). 
Mitochondrial and plastidial versions of RF1 and RF2 — 
mtRFIa, mtRF2a, pRF1 and pRF2— have been described 
and some (mtRFIa and pRF2) have been functionally char- 
acterized (Meurer et al. 2002; Soleimanpour-Lichaei et al. 
2007). But besides these, five other eukaryotic protein families 
have been recognized as putative members of the organellar 
release factor family: mtRF1, mtRF2b, mtRF2c, ICT1, and 
C12orf65 (Raczynska et al. 2006; Chrzanowska-Lightowlers 
et al. 2011). 

Assigning proteins to the release factor family has mostly 
been done automatically, based on their homology to known 
RFs, and, with the exception of ICT1, the molecular functions 
of the noncanonical RFs remain unknown. Nevertheless, the 
individual domains and sequence motifs within the RFs have 
been experimentally well characterized. Bona fide release 
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factors exhibit two catalytic domains: the Codon Recognition 
(CR) domain, composed of the helix alpha-5 and the "anti- 
codon tripeptide motif — PxT in RF1 and SPF in RF2 — ; and 
the peptidyl-tRNA hydrolase (PTH) domain, characterized by 
its universally conserved GGQ motif (Seit-Nebi et al. 2001). 
Extrapolating the function of a protein based on the pres- 
ence/absence patterns of these domains was successful in the 
case of ICT1. This protein lacks the CR domain but contains 
the PTH one, and accordingly, was experimentally shown to 
have a codon-independent release factor activity (Richter 
et al. 2010). 

Tightly linked to translation termination is the genetic 
code, particularly the identity of the nonsense codons used 
to stop translation. This is especially important for organellar 
genomes, because many of them exhibit deviations from the 
standard genetic code (reviewed in Sengupta et al. 2007; 
Ohama et al. 2008; Watanabe 2010). The most common de- 
viations involve a nonsense codon reassignment, in which 
a stop codon (most frequently TCA, but there are also 
a few reports of TAA [Jacob et al. 2009] and TAG 
[Hayashi-lshimaru et al. 1996]) is reassigned to code for an 
amino acid or is simply not used at all (reviewed in Knight 
et al. 2001). The first case of such a reassignment was re- 
ported in 1979 for the human mitochondrion, whose TGA 
codes for a tryptophan (Barrell et al. 1979). In time, as more 
mitogenome sequences got published, this emerged to be the 
standard mitochondrial genetic code, not only for animals 
but also for fungi and most green algae and protists 
(Sengupta et al. 2007). Nevertheless, accurately predicting a 
genome's genetic code, and specifically its stop codons is not 
trivial. In fact, the genetic code of the human mitochondrion 
has been fully resolved only in 2010 (Temperley et al. 2010), 
whereas that of many other organisms still remains unknown. 

Nearly one decade after the discovery of the mitochondrial 
TGA reassignment, Lee et al. (1987) published the first report 
of the coevolution of the mitochondrial genetic code with its 
termination factors, reporting that the lack of usage of UGA 
as a stop codon in the rat's mitochondrion coincided with the 
absence of a mitochondrial type RF2. Since then, similar 
trends have been noted in other organisms (Askarian-Amiri 
et al. 2000; Meurer et al. 2002; Heidel and Glockner 2008), 
adding to the hypothesis that the presence of codon-specifk 
release factors in the organelle has coevolved with its genetic 
code (Jukes and Osawa 1990). However, no systematic studies 
to corroborate this theory have been done so far, and at least 
one instance has been reported, in the social amoeba 
Dictyostelium fasciculotum, where RF2 is retained and ex- 
pressed, despite the lack of TGA stop codons (Heidel and 
Glockner 2008), offering an interesting evolutionary scenario 
that could represent a transition state in switching between 
genetic codes. The mechanisms responsible for these reassign- 
ments have not been unequivocally established, but it has 
been proposed that the stop codons' scarcity (used only once 
per gene) together with the possibility of fast changes in re- 
lease factors — for example, if a RF is deleted as a result of 
genomic streamlining or if a mutation inactivates it — might 
play an important role (Osawa et al. 1992). 



There are very few studies characterizing the organellar 
members of the RF protein family. Most reports focus 
either on the prokaryotic proteins or describe a particular 
organellar RF (e.g., mtRFIa, ICT1, C12orf65, pRF2) (Meurer 
et al. 2002; Soleimanpour-Lichaei et al. 2007; Antonicka 
et al. 2010; Richter et al. 2010). A large-scale systematic anal- 
ysis of the whole RF protein family across all eukaryotes and 
for all organellar types, allowing the detection of general 
trends in organellar RF evolution has not been published. 
Similarly, most studies correlating the RFs with the 
organellar genetic code have focused on the metazoan mito- 
chondrial genetic code (Knight et al. 2001), leaving this 
coevolution hypothesis largely untested for most other 
taxon groups and other organelle types. Here, we classify 
and describe the nine distinct subfamilies of organellar release 
factors by combining large-scale phylogenetic analyses with 
protein function and localization data, the genetic code of 
organellar genomes and empirical knowledge about the role 
of particular motifs within RF domains. This systematic study 
and data conjugation allows us to document the established 
molecular structure and function of each protein subfamily, 
as well as to trace its phylogenetic origin and evolution 
throughout the eukaryotic tree of life. Furthermore, we eval- 
uate the phylogenetic distribution of the RF subfamilies and 
correlate it with the mitochondrial/plastidial genetic code, 
reporting several instances that clearly illustrate the 
coevolution of the release factors with the organellar genetic 
code. 



Materials and Methods 

Sequence Data Retrieval and Selection 
The sequence dataset used was obtained by retrieving all 
human mtRFIa (Gl: 166795303) homologues, using its se- 
quence as query seed for a PSI-BLAST (Altschul et al. 1997) 
search of the Gen Bank nr database, restricted to eukaryotic 
organisms and iterated until convergence. 

The results were manually inspected to remove redundant 
sequences and guarantee the presence of all RF family mem- 
bers. Using as guideline the systematics described by Simpson 
and Roger (2004), the dataset taxonomic coverage was bal- 
anced by removing species from groups that are 
over-represented in the databases, like the fungi/metazoa, 
and keeping and/or manually including species from the 
under-represented taxa like the excavata, alveolata, and 
stramenopiles. We selected only fully sequenced organisms, 
preferably with well-annotated organellar genomes. When 
needed, organism-specific tBLASTn searches were conducted, 
and the relevant homologues were included. 

Prokaryotic homologues of each RF sub-families were col- 
lected by conducting a BLASTp search of NCBI's RefSeq data- 
base restricted to bacteria, and the first hit from the 21 main 
prokaryotic groups, according to (Wu et al. 2009), was in- 
cluded in the dataset (see supplementary table 3, 
Supplementary Material online, for the accession numbers 
of the 359 protein sequences used in this study). 
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Sequence Alignment, Trimming, and Subfamily 
Classification 

Each main subfamily— RF1, RF2, ICT1, and C12orf65— was 
aligned separately. The considerable sequence divergence pre- 
sent between some subfamilies lead us to test the perfor- 
mance of three alignment algorithms: Muscle (v3.7) (Edgar 
2004), MAFFT with L-INS-i iterative refinement option 
(v6.717b) (Katoh et al. 2005), and ClustalW (v2.0.10) 
(Thompson et al. 1994). After careful visual inspection of 
the alignments and its guide trees, the ClustalW alignment 
was chosen, given that it yielded the best overall alignment of 
the known functional elements. 

For the individual RF1 and RF2 phylogenies, we used BAAGE 
(v1.0) (Criscuolo and Gribaldo 2010) to remove ambiguously 
aligned positions. A range of parameter settings was tested, 
and after visual inspection, a 60% gap removal threshold was 
chosen because it yielded the best results relative to the ac- 
curate alignment of the functionally characterized and 
well-conserved domains, while maintaining an acceptable 
number of positions for accurate phylogenetic inference. 
The final RF1 phylogeny contains 148 sequences with 499 
aligned positions, and RF2 contains 74 sequences with 541 
amino acid positions. 

To visualize and classify the multiple RFs from each species, 
we computed a Neighbor-Joining tree with QuickTree (v1.1) 
(Howe et al. 2002) using all 313 eukaryotic full-length 
sequences. These were aligned through profile to profile 
sequential alignment of the individual subfamilies' alignments 
using ClustalW (v2.0.10) (Thompson et al. 1994) followed 
by one last round of alignment refinement with Muscle's 
refine option. Finally, the whole RF family alignment was in- 
spected and manually adjusted. All alignment visual inspec- 
tions were performed using Jalview (v2.7) (Waterhouse et al. 
2009). All alignment data have been deposited in the Dryad 
repository: doi:1 0.5061 /dryad.2br48. 

Phylogenetic Analysis 

The presence of paralogs in the RF1 and RF2 subfamilies 
(3 and 4, respectively) led us to compute individual 
Bayesian phylogenies to clarify their phylogenetic relation- 
ships. These were computed using PhyloBayes (v3.2e) 
(Lartillot et al. 2009). Two independent chains were run for 
RF1 and RF2, using a C20 empirical profile mixture model of 
amino acid substitution and 4 discrete-rate categories 
Gamma distribution (C20+G4). Convergence of the phylog- 
enies was assessed following the guidelines provided with 
PhyloBayes (maximum difference observed across bipartitions 
between the chains <0.1; maximum discrepancy <0.1; and 
minimum effective size >100 for the variables estimated). 
The final majority-rule posterior consensus tree was obtained 
with a burnin value of 1,000, using every-other tree. 

An individual ICT1 plus C12orf65 phylogeny was not cal- 
culated given that the alignment between these two proteins 
would not yield enough confidently aligned positions to 
obtain a reliable phylogeny (no convergence for a Bayesian 
phylogeny could be obtained). 



Organellar Genetic Code Analysis 
A customized set of Perl scripts was developed to analyze the 
organellar genetic codes. For that, the GenBank files of all 
available mitochondrion and plastid genomes (total of 
2,431 files) were retrieved and all relevant information regard- 
ing the number, identity and neighborhood of the stop 
codons predicted for every ORF was parsed and summarized. 
For sequenced but unannotated mitochondrial genomes, we 
used FACIL to predict the genetic code (Dutilh et al. 2011). 

Subcellular Localization Data 

To complement our bioinformatics analysis, we conducted a 
scrupulous manual literature search for experimental locali- 
zation data on all release factor family proteins. We gathered 
public large-scale localization datasets from several model 
organisms, namely, Arabidopsis thaliana (Heazlewood et al. 
2004; Dunkley et al. 2006; Zybailov et al. 2008; Olinares et al. 
2010), Caenorhabditis elegans (Li et al. 2009), Homo sapiens 
(Pagliarini et al. 2008), Mus musculus (Kislinger et al. 2006), 
Saccharomyces cerevisiae (Huh et al. 2003), and 
Schizosaccharomyces pombe (Matsuyama et al. 2006), 
which we examined for localization information about the 
RF family proteins (table 1). 

For proteins without experimental localization data, we 
predicted their subcellular targeting using the method imple- 
mented in ConLoc (Park et al. 2009), whose outcome is based 
on the consensus result of 13 on-line localization prediction 
servers. 

Molecular Modeling 

All models were built using the YASARA molecular modeling 
package (Krieger et al. 2002). The high-resolution structures 
of RF1 bound to the ribosome of Thermus thermophilus (PDB 
entries 3D5A, 3D5B [Laurberg et al. 2008] and PDB entries 
3MR8 and 3MS1 [Korostelev et al. 2010]) were used as 
modeling templates. Loops were modeled by scanning a 
non redundant subset of the PDB (> 8,000 structures) for frag- 
ments with matching anchor points, a minimal number of 
bumps, and maximal sequence similarity. Side chains were 
added with YASARA's implementation of SCWRL 
(Canutescu et al. 2003), and then the model was subjected 
to an energy minimization with the YASARA2 force field as 
described elsewhere (Krieger et al. 2009). WHAT CHECK 
(Hooft et al. 1996) validation scores were used to score and 
rank the final models. 

C12orf65 C-Terminal Extension Analysis 
The observation that both C12orf65 and ICT1 shared a 
basic-residue rich C-terminal extension, together with the 
recent experimental elucidation of the functional role of 
this extra domain in ICTI's bacterial ortholog YaeJ (Gagnon 
et al. 2012) (see ICT1 section for a detailed discussion) led us 
to analyze the relationship between these extensions. To con- 
firm the homology between these domains and predict 
C12orf65's structure, we used HHpred (Soding et al. 2005) 
(data not shown), confirming that these terminal extensions 
are indeed homologous. Moreover, C12orf65's C-terminal 
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extension is predicted to be an alpha-helix, mirroring the 
setting in ICTI's bacterial ortholog. 

Results and Discussion 

To provide an overview of the organellar release factor family, 
we first calculated a simple, yet comprehensive and illustra- 
tive tree of the nine distinct subfamilies (fig. 1). The figure 
shows congruence between tree topology, domain architec- 
ture, and the presence of functionally relevant motifs allowing 
the classification of each organisms' RFs. 

Release Factor Family Classification: Subcellular 
Localization, Structural Characterization, and 
Phylogenetic Origin 

Three widespread organellar release factor protein families 
have been classified via orthology to their eubacterial coun- 
terparts: the two bona fide release factors RF1 and RF2, and 
the release factor-like ICT1. Although the last one is only 
present in the mitochondrion, RF1 and RF2 include both 
the mitochondrial and the plastidial forms, termed mtRFIa, 
mtRF2a, pRF1, and pRF2a, respectively. C12orf65 is another 
frequent mitochondrial release factor-like protein. 
Furthermore, vertebrates possess yet another RF1 homologue 
in the mitochondrion, named mtRF1, and land plants present 
two other RF2 homologues, mtRF2b and mtRF2c, amounting 
in total to nine distinct subfamilies. 

Canonical Release Factors: RF7 and RF2 
Release factor 1 proteins specifically recognize the stop 
codons UAA and UAG, while release factor type 2 proteins 
recognize UAA and UGA. Consistent with their 
codon-specifk peptidyl-tRNA hydrolytic function, both RF1 
and RF2 display all three functionally described structural 
features: the codon-recognition (CR) domain with its 
alpha-5 helix and codon-discriminator tripeptide motif — 
PxT in RF1 and SPF in RF2 — and the peptidyl-hydrolase 
(PTH) domain containing the universally conserved GGQ 
motif (table 1). 

Despite having the same domain composition, sharing the 
same molecular function and the significant sequence simi- 
larity — 48% sequence identity between mitochondrial and 
plastidial RF1s and 55% for their RF2 counterparts (calculated 
using the consensus sequences of each subfamily divided by 
their average length) — each subfamily can be distinguished by 
its different phylogenetic origin and subcellular localization. 

mtRFIa and pRF1 

mtRFIa is the most widespread of all organellar release fac- 
tors. Every eukaryotic organism with a mitochondrial 
genome, harbors a mitochondrial type RF1 encoded in the 
nucleus (supplementary table 1, Supplementary Material 
online). Consistent with the origin of this organelle, this pro- 
tein evolved from an alphaproteobacterial ancestor, as clearly 
demonstrated in figure 2 by the highly supported clustering of 
the alphaproteobacterium Rhodospirillum rubrum at the 
basis of the eukaryotic mtRFIa branch, to the exclusion of 
all other nonalphaproteobacteria prokaryotic sequences. This 
protein has been experimentally well characterized, 
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Fig. 1. Full release factor family neighbor-joining tree. This figure presents an overview of the nine RF subfamilies roughly separated by the NJ algorithm, 
recapturing the pattern of sequence motifs characteristic of each protein. It summarizes, in one image, the main sequence features presented by 
individual organisms. Each subfamily branch is highlighted with a different color following the exterior labels. Well-resolved branches from 
well-established taxa were collapsed to improve readability. In these collapsed branches, a representative domain and motif structure is displayed, 
slightly enlarged in order to stand out from other individual results. The following species were chosen as models for these representative domains: 
viridiplantae and land plants— Arabidopsis thaliana; metazoa, vertebrates, and mammals— Homo sapiens; insects— Drosophila melanogaster; and 
fungi — Saccharomyces cerevisiae. (Legend: Pfam domains displayed in front of each leaf, green hexagon — PCRF (peptide chain release factor) and 
dark-blue arrow — RF-1. Superimposed on the Pfam domains are the functionally characterized motifs: purple diamond — alpha5 helix; cyan oval — PxT 
motif, yellow oval— SPF motif, red oval— PExGxS motif; red diamond— RT insert; green diamond— GGQ motif; pink hexagon— C-terminal helix; and 
orange rectangle— ICT1 -specific helix.) 
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Fig. 2. Bayesian RF1 phylogeny. The two main branches separate the mitochondrial proteins (yellow box) from the plastidial ones (green box). The 
mtRFl branch nested within mtRFla is highlighted in purple with a vertical striped pattern. Well-supported branches from well-established taxa were 
collapsed to improve readability (full noncollapsed tree in supplementary fig. 1, Supplementary Materials online). Alphaproteobacteria and cyano- 
bacteria are highlighted in bold. (Colors for collapsed taxa: Blue— bacteria; green— Viridiplantae; red— fungi; yellow— amoebozoa; purple— vertebrates; 
orange — insects; and brown — apicomplexa.) 



particularly the human ortholog. It is a bona fide peptide 
release factor that localizes in the mitochondrion and speci- 
fically releases UAA/UAG, both in vitro and in vivo 
(Soleimanpour-Lichaei et al. 2007; Nozaki et al. 2008). 

pRF1 is ubiquitous in plastid-bearing species: archaeplas- 
tida (plants, red, and green algae), rhizaria, diatoms, apicom- 
plexa, and brown algae (supplementary table 1, 
Supplementary Material online). Deciphering this protein's 
phylogenetic origin is not trivial, mainly because, unlike the 
mitochondrion that has been the result of a single endosym- 
biotic event, plastids have been acquired several times inde- 
pendently. There were at least two single primary 
endosymbioses of a cyanobacterium (e.g., red algae's rhodo- 
plasts, land plant's, and green algae's chloroplasts from a 
beta-cyanobacterium [Reyes-Prieto et al. 2007], and 
Paulinella's chromatophores from an alpha-cyanobacterium 
[Marin et al. 2005; Yoon et al. 2009]); two secondary endo- 
symbiosis of algae (a red algae gave rise to, for example, 
apicomplexan apicoplasts and stramenopiles' plastids, 
whereas two green algae gave rise to the plastids of eugleno- 
phytes and chlorarachniophytes' [Baurain et al. 2010; 



Janouskovec et al. 2010]) and even tertiary endosymbiosis 
of haptophytes and diatoms in some plastid-bearing dinofla- 
gellates (for recent reviews see Keeling 2010; Archibald 2012). 

As such, one would expect these multiple origins to be, at 
least partially, recaptured in the phylogeny of the plastidial 
RF1. Indeed, the two beta-cyanobacterial RF1 orthologs, from 
Gloeobacter violaceus and Thermosynechococcus elongatus, 
cluster together in a strongly supported branch, with the 
land plants, green and red algae and a group of other plastid 
bearing organisms (fig. 2). 

On the other hand, the phylogenetic signal in the pRF1 
alignment does not seem to be strong enough to recapitulate 
the red algal secondary origin of the apicomplexan plastids, 
because these groups confidently with the green algae 
Chlamydomonas reinhardtii and not with the red algae 
Cyanidioschyzon merolae. The same holds true for the dia- 
toms, brown algae and Emiliania, which cluster with each 
other excluding the red algae (fig. 2). 

Several experimental studies have been published regard- 
ing pRFI's localization and function. Two independent re- 
ports show its plastidial localization (Zybailov et al. 2008; 
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Fig. 3. Bayesian RF2 phylogeny. The top branch, highlighted in yellow, groups known mitochondrial proteins (mtRF2a), several nonresolved bacterial 
RF2s and, highlighted with a checkerboard pattern, the mtRF2c branch (which has been experimentally shown to localize in the chloroplast). The 
bottom branch, highlighted in green, clusters the plastidial RF2s and mtRF2b (indicated by a vertical strip pattern). Alphaproteobacteria and 
cyanobacteria are highlighted in bold. (Colors for collapsed taxa: blue — bacteria; pink — alphaproteobacteria; green — viridiplantae; yellow — amoebozoa; 
brown— apicomplexa; and cyan— cyanobacteria). (Full noncollapsed tree available in supplementary figure 2, Supplementary Materials online.) 



Olinares et al. 2010), and the molecular function of A thali- 
ana's pRF1 has been experimentally characterized in vivo. Not 
only it is essential for appropriate chloroplast development, 
but also successfully rescues the temperature-sensitive phe- 
notype of Escherichia coli RF1 mutants, proving that this pro- 
tein is indeed a functional translation release factor 
(Motohashi et al. 2007). 

mtRF2a and pRF2 

The mitochondrial mtRF2a has a relatively narrow phyloge- 
netic distribution, when compared to its mtRFIa counterpart. 
It has been lost at least five times during the eukaryotic evo- 
lution (fig. 4), coevolving together with the mitochondrial 
genetic code (see section II. Release factors and the evolution 
of the genetic code). It is only consistently found in strepto- 
phytes (land plants), red algae, dictyosteliida, and some stra- 
menopiles (namely in brown algae, oomycetes and 
Blastocystis). It is absent from animals, fungi and excavata, 
with the exception of the heterolobosean Naegleria gruberi 
(supplementary table 1, Supplementary Material online). 

The expected alphaproteobacterial ancestry of this protein, 
given the endosymbiotic origin of mitochondria, cannot be 
unequivocally established from our RF2 phylogeny (fig. 3). 
Most prokaryotic sequences present in this dataset do not 
form a monophyletic group, being instead all grouped in an 



unresolved branch, containing also most eukaryotic mito- 
chondrial RF2s (fig. 3). 

There are no experimental data on this protein's molecular 
function and localization in eukaryotes. Nevertheless, its E. coli 
ortholog has been thoroughly studied and shown to termi- 
nate translation by decoding UAA and UAG, both in vitro 
and in vivo (Scolnick et al. 1968; Mora et al. 2003). 

The plastidial RF2's phylogenetic distribution overlaps per- 
fectly with its RF1 counterpart (with the exception of 
Toxoplasma gondii, see below), being ubiquitously present 
both in the primary plastids of land plants, red and green 
algae, and in the secondary plastids of apicomplexan parasites, 
diatoms, and brown algae (supplementary table 1, 
Supplementary Material online). 

As mentioned earlier, the multiple origins of plastids chal- 
lenge the task of tracing the phylogenetic origin of these 
organellar proteins. The cyanobacterial origin of primary plas- 
tids' RF2 is recaptured by the strongly supported grouping of 
the two cyanobacteria within the plastidial branch of this 
phylogeny (fig. 3). No strong conclusions can be drawn re- 
garding the origin of apicomplexan, diatom and brown algae 
secondary plastids given the unresolved phylogenetic 
branches comprising these organisms. 

Contrasting with the lack of published functional data 
about the mitochondrial RF2, the chloroplastidial localization 
of A. thaliana's pRF2 has been experimentally determined 
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Fig. 4. Schematic eukaryotic phylogeny displaying, per lineage, the coevolution of the mitochondrial genetic code with the codon-specific RFs. The red 
circle indicates the unique primary endosymbiosis event that originated the mitochondrion. The green algae stop-codon reinvention hypothesis is 
detailed in the gray "zoom-in" area. Species relationships were assembled from two studies: the main tree is based on the consensus tree depicting the 
six main eukaryotic groups from Simpson and Roger (2004) and the green algae lineage is based on the 18S rRNA gene tree published by Worden et al. 
(2009). Branching order is meaningful, but not branch length. Red font highlights the exceptions to the coevolution discussed in the text. The star marks 
the "TGA-stop reinvention" with pRF2 relocalization hypothesis in the green lineage. Question marks are used for uncertain data, and two asterisks 
indicate no mitochondrial genome available. (See supplementary methods, Supplementary Materials online for details about the species used in making 
this figure). 



(AAeurer et al. 2002; Zybailov et al. 2008), and its function has 
been shown to be primarily in the termination of UGA stop 
codons, but also in the regulation of chloroplastidial protein 
synthesis and stability of UGA-containing mRNAs (AAeurer 
et al. 2002). 

Noncanonical RFs 

The remaining five RF subfamilies have lost one or both of the 
structural features that characterize bona fide release factors. 
Their origins are more diverse and, except for ICT1, their 
phylogenetic distribution is not as uniform. Most have not 
been characterized experimentally and for some, their subcel- 
lular localization has not been established (table 1). 



mtRF1 

mtRF1 is probably the most studied nonclassical release 
factor, and yet its molecular function fails to be determined. 
It is the longest protein of the RF family — the human protein 
is 445 amino acids long, while mtRFla is only 380. Its 
C-terminal is remarkably similar to bona fide RFs, presenting 
an analogous PTH domain harboring the ultra conserved 
GGQ, but it shows some differences within the codon recog- 
nition domain that set it apart from other canonical RFs. 
Most notably, it lacks the characteristic PxT motif, displaying 
instead PExGxS (most commonly PEVGLS) (table 1). Another 
intriguing sequence feature is a distinctive RT insert within 
the alpha-5 helix that extends the recognition loop without 
disrupting the overall domain architecture (discussed later). 
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This is a vertebrate-specific mitochondrial protein 
(Soleimanpour-Lichaei et al. 2007) and it has been reported 
to have originated by duplication of the mtRFIa gene at the 
root of this clade (Young, Edgar, Murphy, et al. 2010). In our 
figure 2 phylogeny, we observe this protein's branch in an 
unresolved cluster with the vertebrate mtRFIa branch and 
several other metazoa and earlier branching eukaryotes. 
Notwithstanding, its high sequence conservation together 
with its ubiquitous and exclusive distribution within verte- 
brates, leave no doubt that this protein arose by mtRFIa 
duplication at the root of the vertebrate lineage. 

Young, Edgar, Murphy, et al. (2010) have suggested that 
mtRF1 could be responsible for decoding the nonstandard 
mitochondrial stop codons, AGG and AGA, predicted to ter- 
minate numerous vertebrate mitochondrial orfs. A number of 
observations contribute to this hypothesis. First, it possesses 
the canonical domains involved in peptidyl-tRNA hydrolysis 
and in stop codon recognition (although not with the classi- 
cal tripeptide motif). Second, this protein's origin at the root 
of the vertebrate lineage coincides with the origin of AGG/ 
AGA stop codons, hinting that their roles might be function- 
ally connected. Finally, despite the lack of in vitro release 
activity in response to any potential stop codon 
(Soleimanpour-Lichaei et al. 2007; Nozaki et al. 2008), 
mtRF1 has been argued to possess several structural features 
capable of recognizing adenine as the first base of a stop 
codon (Young, Edgar, Murphy, et al. 2010), linking again 
AGG/AGA codons with this protein. 

Nevertheless, this hypothesis has never been experimen- 
tally confirmed. Moreover, Temperley et al. (2010) demon- 
strated that, at least in human, AGG and AGA codons do not 
function as stop. Instead, they promote a -1 frameshift in both 
genes containing AGG and AGA (ND6 and C07), yielding a 
standard TAG stop codon, hence bypassing the need for an 
extra RF protein. On the other hand, Young, Edgar, Murphy, 
et al. (2010) remark that some vertebrates' mitogenomes pre- 
sent a cytb and/or CO/ gene that do not possess a T imme- 
diately before these "frameshifting codons". In such cases, a -1 
frameshift does not generate a standard termination codon, 
leaving unexplained the mechanism of termination of these 
genes. 

To investigate this matter, we used all 1,604 complete 
vertebrate mitochondrial genomes deposited at the time of 
our analysis in the NCBI's organellar genomes database, to 
systematically evaluate both the origin of AGG/AGA termi- 
nated orfs and verify to what extent the postulated -1 frame- 
shift mechanism would not originate a canonical TAG stop. 

First, our findings show that the mitochondrial orfs termi- 
nated with AGG/AGA indeed arose at the root of the verte- 
brates, and are not present in any other eukaryotes, which use 
them to code for arginine. Second, there were 1,535 orfs pre- 
dicted to stop with AGG/A, from 947 distinct species. From 
those, a TGA stop arising from a -1 frameshift could account 
for 395 orfs, leaving 1,140 orfs from 808 different vertebrate 
species unable to terminate translation with a standard stop 
codon. (We also examined if a -2 frameshift, creating a TAA 
stop, could hypothetical^ solve the issue of "nonterminated" 
orfs, but only an extra 188 orfs would be terminated.) 



Additionally, to gain more insight on this protein's putative 
function, in a separate publication manuscript, we describe 
the results from a molecular model analysis conducted on 
mtRF1 3D structure. We predict that it is unlikely that mtRF1 
recognizes any codon at all, as amino acid substitutions and 
insertions at the codon recognition domain of mtRF1 create 
additional hydrophobic bulk that is highly unlikely to tolerate 
any mRNA in the A site of the ribosome (Huynen et al. 2012). 

ICT1 

ICT1 (immature colon carcinoma transcript-1) is much 
shorter than any of the canonical release factors — 206 resi- 
dues in human, compared to the 380 residues of mtRFIa. It is 
an experimentally confirmed mitochondrial protein that has 
lost both structural elements responsible for the stop codon 
recognition (the C-terminal alpha-5 helix and the tripeptide 
motif), while retaining the GGQ PTH domain (table 1). 
Consistent with this domain composition, Richter et al. 
(2010) have demonstrated that this protein indeed functions 
as a stop-codon independent PTH. Also, they have shown 
that, in human, ICT1 is incorporated into the mitoribosome's 
large subunit, leading to the suggestion that it was recruited 
there in the course of eukaryotic evolution. However, recently 
it has been reported that £ coli's ICT1 ortholog, YaeJ, is already 
part of the bacterial ribosome, indicating that the ribosomal 
location of the orthologous group precedes the origin of eu- 
karyotes (Handa et al. 2011). 

Mouse's ICT1 catalytic domain structure, comprising the 
loop containing the GGQ, has been determined by NMR 
spectroscopy, showing an overall topology and structural 
framework similar to Class I RFs PTH domain, confirming 
its analogous hydrolytic activity. There is nevertheless a dis- 
tinguishing feature that sets this domain apart from the one 
found in canonical Class I release factors. Handa et al. (2010) 
describe a groove formed by an ICT1 -specific alpha-helix in- 
serted between two conserved beta-strands, and they pro- 
pose that this element might be a site related to this protein's 
specific catalytic activity. 

Also, it has been noted that ICT1 presents a C-terminal 
extension rich in basic-residues, characteristic of many ribo- 
somal proteins (Brodersen et al. 2002), agreeing with its ribo- 
somal location. A recent crystal structure of the bacterial ICT1 
ortholog YeaJ (Gagnon et al. 201 2) reveals that this C-terminal 
extension acts as a sensor to detect stalled ribosomes, based 
on the occupancy of the mRNA channel in the ribosome. 
Upon recognition of an empty mRNA channel, the catalytic 
GGQ motif of YeaJ can bind in the peptidyl-transferase center, 
resulting in subsequent release of the nascent peptide chain. 
Based on these recent findings, it is tempting to speculate that 
ICT1 performs a similar function in mitochondria. 

ICTI's widespread eubacterial distribution (Handa et al. 
2011) suggests that this protein is of ancient origin and not 
from an RF1 or RF2 gene duplication. Apart from mtRFIa, this 
is the only subfamily present in all eukaryotic phyla analyzed 
(with a few notable exceptions, namely C merolae, 
Neurospora crassa, Sclerotica sclerotiorum, and 
Phytophthora infestans as shown in supplementary table 1, 
Supplementary Material online). This broad taxonomical 
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distribution is in accordance with its reported essentiality in 
human (Richter et al. 2010). 

C12orf65 

The C12orf65 orthologous group provides a similar example 
of loss of the two stop-codon recognition functional ele- 
ments, while retaining the catalytic GGQ motif. C12orf65 
misses the ICT1 -specific alpha-helix, and accordingly it has 
been reported that, contrary to ICT1, this is a mitochondrial 
soluble matrix protein that does not exhibit 
ribosomal-specifk PTH activity (Antonicka et al. 2010). 
Note however, that despite this obvious functional diver- 
gence, ICT1 over-expression partially rescues the biochemical 
defect presented by C12orf65 mutated cells, hinting that both 
proteins must have at least partially overlapping functions in 
the mitochondrion. Further evidence for a similar function 
between the two proteins comes from the observation that 
C12orf65 has a (predicted) C-terminal alpha-helix that is ho- 
mologous to a recently described ICT1 C-terminal helix. In 
ICT1 E. coli ortholog YaeJ this C-terminal helix, as described in 
the previous section, functions in sensing an empty mRNA 
channel in the ribosome (Gagnon et al. 2012). The homology 
between ICT1 and C12orf65 includes the conservation of 
basic residues that in YaeJ interact with ribosomal proteins 
and the ribosomal SSU rRNA. 

Since only 21 bacterial species from 5 (out of 28) bacterial 
groups (BLAST results not shown) harbor a C12orf65 homo- 
logue, it is likely that this protein is a eukaryotic invention 
derived from a duplication of a canonical RF. The fact that 
C12orf65 and ICT1 have lost, relative to canonical RFs, the 
same stop-codon recognizing domains, and that both share 
the C-terminal alpha-helix that is absent from canonical RFs 
provide a strong argument that C12orf76 is derived from 
ICT1, which has a wider phylogenetic distribution. 
Nevertheless, our phylogenetic analyses based on the posi- 
tions that could confidently be aligned between all organellar 
release factors did not show strong support for a direct rela- 
tionship between C120rf65 and ICT1. 

C12orf65 is notably absent from viridiplantae (land plants 
and green algae), being present in all other eukaryotic taxa 
(supplementary table 1, Supplementary Material online). The 
most parsimonious scenario to explain this absence is that it 
likely originated at the root of the eukaryotes, and was sub- 
sequently lost in the green lineage. 

mtRF2c and mtRF2b 

Land plants (embryophytes) present two extra RF-like pro- 
teins that are not present in any other organism: mtRF2c and 
mtRF2b. These two proteins have not been experimentally 
studied and their domain divergence and rearrangement 
allows only educated guesses regarding their molecular 
function. 

mtRF2c is much shorter than other plant RF2s (only 257 
residues in A thaliana), and has lost both the alpha-5 helix 
and the stop codon recognizing motif, keeping only the GGQ 
hydrolyzing tripeptide. This protein has never been function- 
ally characterized. 

No rigorous phylogenetic interpretation regarding 
mtRF2c's origin can be made from the RF2 phylogeny 



presented (fig. 3). Not only is its branch nested within the 
unresolved mtRF2a cluster, but also the very long-branch 
length precludes any significant conclusion. 

Contrary to the "mitochondrial localization" suggested by 
its name, mtRF2c has been found experimentally in the chlo- 
roplast of A. thaliana (Zybailov et al. 2008). Therefore, we 
propose this protein to be renamed from mtRF2c to pRF2c, 
to correctly express its subcellular localization, following the 
convention used for the other release factors. 

mtRF2b represents a unique type of release factor given its 
loss of both RF signature-motifs, that is, the GGQ tripeptide 
and the stop-codon recognizing motif. Despite its sequence 
divergence and absence of these two features, this protein has 
retained the overall structure of the two release factor family 
domains (Pfam names RF1 and PCRF) (fig. 1), suggesting that 
this is a genuine member of this family. Also, corresponding 
EST sequences for several land plants are present in NCBI's 
EST database, confirming that this protein is indeed expressed 
and not a pseudogene. 

Despite its "mitochondrial naming," there are no experi- 
mental localization data about this protein. Also, localization 
prediction analysis using ConLoc (Park et al. 2009) gave no 
unambiguous results. Nevertheless, it has been described that 
proteins interacting with organellar multi-subunit complexes 
tend to inherit the subcellular localization of their ancestral 
protein (Szklarczyk and Huynen 2010). The strongly sup- 
ported clustering of mtRF2b's branch within the plastidial 
branch of our RF2 phylogeny (fig. 3), indicates not only that 
this protein has originated from a duplication of the land 
plants' plastidial RF2, but also suggests that mtRF2b might 
be plastidial. Further localization studies are required to cor- 
roborate this prediction. Given the loss of the GGQ motif 
from mtRF2b, it is tempting to speculate that this protein 
will not present hydrolytic capabilities. 

Release Factors and the Evolution of the Genetic 
Code 

The coevolution of the genetic code with the release factors 
has been proposed by Jukes and Osawa over 20 years ago 
(Jukes and Osawa 1990). Nevertheless, its universality has 
never been assessed, and many interesting questions remain 
unanswered: was RF2 lost before or after the stop codon 
reassignment; was it lost once in the common ancestor or 
several times in independent lineages; is it present in species 
that do not use TGA as stop codon, and if so, does it (apart 
from the redundant recognition of UAA) have any other 
function in these organisms? 

We performed a systematic analysis of the organellar ge- 
netic code and the presence of mitochondrial and plastidial 
RF2 (figs. 4 and 5). Based on 95 currently sequenced nuclear 
genomes of organisms with annotated organellar genomes, 
the mitochondrial-type RF2 has been lost five times in evo- 
lution: in kinetoplastids, diatoms, alveolates, at the root of the 
opisthokonta and in the green algae lineage, whereas the 
usage of TGA as stop codon in mitochondrial genomes has 
been lost seven times: not only in the same diatoms, alveolata, 
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Fig. 5. Coevolution of the plastidial RFs with the plastidial genetic code. The red and green backgrounds mark the red and green plastid lineages, 
respectively. The branching order is based on Marin et al. (2005) and Keeling (2010). Blue circles indicate a primary endosymbiosis and blue squares 
represent a secondary endosymbiosis event. Red font highlights the cases that represent exceptions to the coevolution of the RF with the plastidial 
genetic code, where the pRF2 is present in the genome, but TGA stop codons are not used. The star marks the possible "TGA-stop reinvention." 
Question marks are used for uncertain data and two asterisks indicate no whole genome available. (See supplementary methods, Supplementary 
Materials online for details about the species used in making this figure.) 



opisthokonta, and green algae, but also in heterolobosea, 
rhizaria, and some amoebozoa (fig. 4). 

Plastidial RF2s and genomes show less volatility than mi- 
tochondrial ones. pRF2 has only been lost once (in T. gondii) 
while TGA as a stop-codon was lost twice: in T. gondii (agree- 
ing with a previous report [Denny et al. 1998]) and at the root 
of the green algae lineage (fig. 5). 

In almost all cases, the RF2 loss coincides with the lack of 
usage of TGA stop codons, supporting the coevolution hy- 
pothesis. For example, Phaeodactylum tricomutum and 
Thalassiosira pseudonana have lost mtRF2a and they both 
lack mitochondrial genes terminated by TGA; T. gondii has 
lost pRF2 and accordingly none of its 26 plastidial orfs is 
predicted to stop with a TGA codon (supplementary 
table 4, Supplementary Material online). Nevertheless, 4 
exceptions were found, which are briefly discussed in the 
following two sub-sections. 

RF2 Without UGA 

Most archaeplastida organisms use the same standard genetic 
code in both organelles. Green algae represent the exception 
to this pattern. Despite maintaining pRF2, C reinhardtii, 
Ostreococcus tauri, Micromonas sp., and Bathycoccus sp. do 
not use TGA as a stop codon in their chloroplast genome 
(fig. 5). In our dataset, only Micromonas pusilla retains one 
gene (PsbL) that is terminated with TGA, hence likely using its 
plastidial RF2. 



Another such case is observed in the mitochondrion of the 
heterolobosean N. gruberi (fig. 4). This species still has a mi- 
tochondrial RF2, despite not using TGA stop codons in any of 
its 46 mitochondrially encoded genes. 

Particularly intriguing is the scenario displayed by the three 
social amoebae included in our study (fig. 4). While all three 
encode mtRF2a, only Dictyostelium discoideum has retained 
the usage of TGA stop codons (in the two non hypothetical 
orfs rplS and rp/76). On the other hand, D. fasciculatum 
and Polysphondylium pallidum do not possess any TGA- 
terminated mitochondrial genes, and P. pallidum's mitogen- 
ome is even predicted to use TGA encoded tryptophan in five 
protein coding sequences (rps3, rpll6, rps8, orf919, and orf83). 
If P. pallidum's mitogenome truly uses TGA to code for W, 
this would be an exceptional setting where the same codon 
could be decoded both by a cognate tRNA and a release 
factor. 

To evaluate the plausibility of this scenario, we compared 
the sequences of the five peptides containing TGA encoded 
tryptophan to their orthologous sequences in closely related 
species, and found no strong arguments that this would be 
the case. First, TGA codons are only used in five orfs, only 
once per orf, and in all of them the codon is located near the 
predicted termination codon — 13 amino acids from TAA in 
rps3, 7 residues from TAG in rps8, immediately before TAA in 
rp/76, 5 and 2 amino acids from TAA in orf83 and or/979, 
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respectively. Second, for the three non hypothetical orfs, nei- 
ther the tryptophan nor the small protein "extensions" (cre- 
ated by including W and the following residues until the 
annotated stop codon) are conserved in the orthologous 
proteins from other dictyosteliida species. Together, these 
observations suggest that P. pallidum uses TGA stop 
codons, hence making use of its mtRF2a specificity, mirroring 
what is observed in D. discoideum. Moreover, as long as RF2 is 
not lost, the nonsense codon reassignment to tryptophan 
would be hard to establish having to compete with the release 
factor. This is in line with the idea that as long as a stop codon 
is recognized by a RF, it cannot become reassigned to code for 
an amino acid (Osawa et al. 1992). 

These repeated instances of TGA loss without loss of RF2, 
together with the significant TGA-stop reduction in most 
organisms (supplementary table 4, Supplementary Material 
online), suggests that alternative genetic codes might arise 
more commonly by disappearance of the stop-codon first, 
and only then the loss of its respective release factor. 

UGA without RF2 

The mitochondrial genome of mamiellales algae display the 
opposite scenario: UGA stop codons have been retained, but 
there is no mtRF2a to decode them (fig. 4). Ostreococcus tauri 
has six predicted orfs ending with TGA (two of which are 
nonhypothetical proteins — Rpl5 and Rps8) that would be 
extended by 25 and 45 amino acids, respectively, if they 
were to use the next in frame non-TGA stop codon. These 
potential "extensions" are not present in any other members 
of the RplS and Rps8 gene families (data not shown). 
Bathycoccus sp has one nonhypothetical open reading 
frame (Rp/76) that ends in TGA, while its Rps8 gene termina- 
tes with a TAA that perfectly aligns with the TGA in 
Ostreococcus' Rps8. The most parsimonious scenario to ex- 
plain this TGA-stop reusage in Ostreococcus and Bathycoccus, 
would be that early in the evolution of green algae the mito- 
chondrial RF2 was lost concomitantly with the usage of TGA 
stop codons in the mitogenome, followed by a later 
"reinvention" of TGA as stop in those two species (fig. 4). 

Favoring this hypothesis is the fact that, even though the 
usage of TGA stop codons was lost earlier in green algae 
evolution, this codon has not been reassigned as a 
sense-codon in the earlier branching green algae — TGA is 
simply not used in the mitochondrial genomes of 
Chlamydomonas and Micromonas — facilitating the reversion 
to the ancestral state. Nevertheless, this TGA-stop reusage 
requires the presence of a release factor capable of recognizing 
it, and mtRF2a has been lost at the root of the green algae, 
leaving open the question: how can these algae decode TGA 
stop codons in their mitochondria? 

One possibility would be to retarget the plastidial RF2 to 
the mitochondrion. This retargeting would explain not only 
the ability to decode TGA stops in the mitochondrion but 
also the conservation of pRF2 in green algae, which do not use 
TGA stop codons in their chloroplast genome (see previous 
section). Other multiple subcellular targeting examples 
have been described in organisms with multiple genome- 
containing organelles, like the apicomplexa (e.g., 



Plasmodium falciparum and T. gondii [Pino et al. 2007; 
Ralph 2007]) and A. thaliana (Duchene et al. 2009). 

Noncanonical Motifs 

RF1 and RF2 protein family members have been primarily 
classified based on the identity of the two experimentally 
characterized tripeptide motifs — PxT in RF1 and SPF in 
RF2 — which confer their distinct codon specificity. Despite 
their nearly universal conservation, we came across 13 
nonclassical motifs: 10 noncanonical PxT and 3 noncanonical 
SPF (fig. 1 and supplementary table 2, Supplementary 
Material online). 

The three nonclassical RF2s have a SPY motif (Babesia 
bovis, Ectocarpus siliculosus, and Erythrobacter litoralis), 
which is rare in organellar RFs but is present in nearly one- 
third of eubacteria (data not shown), immediately suggesting 
that this variability does not affect its function. Also, this 
phenylalanine (F) to tyrosine (Y) change in the third position 
of the motif is not disruptive given that the amino acid di- 
rectly involved in the discriminatory role of RF2 is the first 
residue from the tripeptide (serine) and not the third (obser- 
vation of SBN). 

From the 10 noncanonical PxT motifs, 9 are PxN and 1 is 
PTS (supplementary table 2, Supplementary Material online). 
Most of the PxN motifs (7 out of 9) sit on a 2 amino acid 
shorter recognition loop that, despite its unusual features, has 
been experimentally tested in C elegans, displaying full UAA/ 
UAG-specific release activity, both in vitro and in vivo (Young, 
Edgar, Poole, et al. 2010). This, together with the fact that 
this novel shorter loop has arisen at least three times 
independently in evolution — in metazoa, stramenopiles and 
apicomplexa — suggests that it might represent a viable alter- 
native conformation. 

To better understand the retained functionality of this 
alternative loop conformation, we have built a molecular 
model of C elegans' mtRFIa (with its PVN motif and shorter 
recognition loop). To do so, we used T. thermophilus' crystal 
structure of RF1 bound to a ribosome with a UAA stop codon 
in the A-site (fig. 6A). Our model clearly shows that, despite 
the shortened recognition loop, the tripeptide's asparagine 
(N) is still able to make the crucial hydrogen bonding inter- 
action to the first nucleotide of the stop codon (fig. 6B), just 
like the threonine in the canonical PxT motif, which deter- 
mines selectivity over other nucleotides (Korostelev et al. 
2008; Laurberg et al. 2008). 

Despite the over-representation of shorter nonclassical 
loops, there were two PxN motifs in proteins with full-sized 
recognition loops, that is, without any post-motif deletions: 
P. falciparum's pRF1 (PKN) and Cryptococcus neoformans' 
mtRFIa (PAN). Again we computed a molecular model for 
this alternative structure (not shown), this time using 
Cryptococcus 's mtRFIa sequence. In the model, we unequiv- 
ocally observe the H-bond between the tripeptide's aspara- 
gine (N) and the first U from the stop codon. This explains the 
published experimental evidence that full-length recognition 
loops with a noncanonical tripeptide PxN are also capable of 
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Fig. 6. Molecular modeling of the noncanonical PxT motif from Caenorhabditis elegans' mtRFIa. (A) Hydrogen bonding interaction between the first 
nucleotide of the UAA stop codon (U1) and the threonine of the PxT motif (labeled Thr) of the reading head of RF1 in the Thermus thermophilus crystal 
structure (PBD entry 3D5A [Laurberg et al. 2008]). (B) Molecular model of the reading head conformation in the C elegans RF1. The asparagine of the 
PxN motif is capable of making a similar hydrogen bonding interaction to the first nucleotide of the stop codon as a result of a two amino acid deletion 
in the recognition loop. The first two stop codon nucleotides (U1 and U2) are shown in green in both panels. 



codon-specific release activity in a C elegans bacterial chime- 
ric system (Young, Edgar, Poole, et al. 2010). 

The only non-PxN motif found in our dataset belongs to 
B. bovis, which has a PTS tripeptide motif in a full-length 
recognition loop. Based on their biochemical and structural 
similarity, this threonine to serine substitution in the third 
position of the motif, most likely, maintains the same 
intra-molecular interactions, representing a nondisruptive 
substitution (observation of SBN). 



One "Extra" ICT1 without GGQ 

The phylogenetic distribution of RF proteins can provide valu- 
able clues about putative functional interactions. We ana- 
lyzed all cases where a particular species displayed a RF 
presence/absence pattern that deviates from the trend of 
the taxonomical group. Despite the several interesting cases 
found (for details see supplementary table 1, Supplementary 
Material online), there is just one for which we could find 
convincing evidence that the departure might be functionally 
relevant, that is, where the same pattern has been found in 
related species, and is thus unlikely a sequencing error or a 
pseudogene. 

Ixodes scapularis, the black-legged tick, seems to have 
gained an additional ICT1 -related protein, while loosing 
C12orf65. This extra ICT1-like protein is approximately the 
same size as the canonical one (171 vs. 166 residues, respec- 
tively) but has lost the GGQ motif, setting it apart from clas- 
sical ICT1s, and possibly conferring a different molecular 
function. Also the full-length peptide is expressed in /. scapu- 
laris and other Ixodidae family members (e.g., /. nanus, 
Rhipicephalus microplus, and Tetranychus urticae) (data 
from NCBI's EST database) further supporting its credibility 



as a "real protein." Further experimental studies are needed 
to shed some light on this putative "novel" ICT-like protein. 

Conclusion 

Organellar translation termination is still far from understood. 
Although cytosolic translation employs a single release fac- 
tor — eRF1 — belonging to a highly conserved protein family, 
the organellar release factors comprise nine subfamilies. This 
protein family seems to be particularly prone to undergo 
genetic expansion and functional divergence. In fact, this 
trend can also be observed in bacteria. Apart from RF1, 
RF2, and YaeJ (ICTI's bacterial ortholog), at least one other 
bacterial RF duplicated gene — £ co//'s prfH — has been docu- 
mented and proposed to be one more member of the Class I 
release factor family in bacteria (Baranov et al. 2006). 

Despite the loss and/or departure from canonical motifs in 
some RFs, these subfamilies can still be recognized as release 
factors, suggesting conservation of structure and a possible 
interaction with the ribosome. Nevertheless, experimental 
characterization of each subfamily's specific function is para- 
mount. For example, it would be interesting to experimentally 
assess the molecular function of the RFs that have lost all 
functionally characterized motifs — as the mtRF2b plant sub- 
family or the ICT1-like protein from /. scapularis — to evaluate 
the effects of such sequence divergence on translation termi- 
nation. Also, it is necessary to evaluate how comparable this 
process is between organelles and between the same organ- 
elle in different species, given their dissimilar RF content. 

Here, we have paved the way for this experimental char- 
acterization by classifying and highlighting the most striking 
attributes of each main RF subfamily. We have clarified, as far 
as possible, RF1 and RF2 phylogenetic origins and have shown 
that most organellar release factors tend to keep their 
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ancestral subcellular localizations — mitochondrial RFs derive 
either from alphaproteobacteria (mtRFIa) or from duplica- 
tions of canonical mitochondrial RFs (e.g., mtRF1 in 
Vertebrates); and RFs from primary plastids originated from 
cyanobacteria (pRF1 and pRF2) or duplications of plastidial 
proteins (mtRF2b proposed to be renamed pRF2b), following 
the observed trend that irrespective of the relocalization of 
the genes, proteins from organellar multi-subunit complexes 
and their interacting partners tend to continue to function in 
their original compartment (Szklarczyk and Huynen 2010). 

Also, we have explored the tight connection between the 
organellar RFs and the identity of the stop codons used, re- 
vealing a picture of dynamic ongoing evolution within this 
protein family. The complementarity observed in green algae 
organelles (where the plastid still retains a RF2 without any 
gene predicted to terminate with TGA, and the mitochondria 
has lost the RF2 but still uses TGA stop codons) presents a 
fascinating scenario that lead us to propose a stop-codon 
"reinvention" with pRF2 relocalization to the mitochondrion 
in the green algae lineage. Notably, this would be an exception 
to the general pattern that proteins that function in a com- 
plex maintain their ancestral subcellular localization. Despite 
the elegance of this hypothesis, it requires experimental val- 
idation before any further conclusions can be drawn from 
such a mechanism. 

Overall, our comprehensive classification of the organellar 
release factor family should serve as a starting point for pri- 
oritization of experimental efforts such that, for each of the 
nine orthologous groups, the subcellular location is unequiv- 
ocally established, and the effects of knockouts/knockdowns 
or site-specific mutagenesis on translation termination are 
measured, better clarifying this essential cellular process. 

Supplementary Material 

Supplementary methods, figures S1-S2, and tables 1-4 are 
available at Molecular Biology and Evolution online (http:// 
www.mbe.oxfordjournals.org/). 
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